Microsoft Word - JBIO_Proteins Sequence Alignment
نویسنده
چکیده
Abstract—Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. One of the important research topics of bioinformatics is the multiple proteins sequence alignment. Since the exact methods for MSA have exponential time complexity, the heuristic approaches and the progressive alignment are the most commonly used in multiple sequences alignments. In the progressive alignment strategy, choosing and merging of the most closely (similarly) sequences is one of the important steps. The information theory provides such a similarity measure using the mutual information (MI). In this paper, we propose a progressive alignment strategy modification based on mutual information. To measure this similarity we define a distance between the sequences based on mutual information, and then we construct a distance matrix. The elements of a row of this matrix correspond the distance between a sequence and all other sequences. A guide tree is built using the distance matrix. We obtain preliminary distance matrix without pairwise alignment in the first step. The principle contribution in this paper is the modification of the first step of the basic progressive alignment strategy i.e. the computation of the distance matrix which yields to a new guide tree. Such guide tree is simple to implement and gives a good result's performance. The results of our testing in all dataset BAliBASE 3.0 data base show that the proposed strategy is as good as Clustalw in most cases.
منابع مشابه
Simultaneous Word-Morpheme Alignment for Statistical Machine Translation
Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-...
متن کاملConsistency-Aware Search for Word Alignment
As conventional word alignment search algorithms usually ignore the consistency constraint in translation rule extraction, improving alignment accuracy does not necessarily increase translation quality. We propose to use coverage, which reflects how well extracted phrases can recover the training data, to enable word alignment to model consistency and correlate better with machine translation. ...
متن کاملProbabilistic Word Alignment under the $L_0$-norm
This paper makes two contributions to the area of single-word based word alignment for bilingual sentence pairs. Firstly, it integrates the – seemingly rather different – works of (Bodrumlu et al., 2009) and the standard probabilistic ones into a single framework. Secondly, we present two algorithms to optimize the arising task. The first is an iterative scheme similar to Viterbi training, able...
متن کاملA Bio-Inspired Approach for Multi-Word Expression Extraction
This paper proposes a new approach for Multi-word Expression (MWE)extraction on the motivation of gene sequence alignment because textual sequence is similar to gene sequence in pattern analysis. Theory of Longest Common Subsequence (LCS) originates from computer science and has been established as affine gap model in Bioinformatics. We perform this developed LCS technique combined with linguis...
متن کاملMultiple Word Alignment with Profile Hidden Markov Models
Profile hidden Markov models (Profile HMMs) are specific types of hidden Markov models used in biological sequence analysis. We propose the use of Profile HMMs for word-related tasks. We test their applicability to the tasks of multiple cognate alignment and cognate set matching, and find that they work well in general for both tasks. On the latter task, the Profile HMM method outperforms avera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012